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Abstract 

The FFT EnKF data assimilation method is proposed and apphed to a stochastic 
cell simulation of an epidemic, based on the S-I-R spread model. The FFT EnKF 
combines spatial statistics and ensemble filtering methodologies into a localized and 
computationally inexpensive version of EnKF with a very small ensemble, and it is 
further combined with the morphing EnKF to assimilate changes in the position of the 
epidemic. 

1 Introduction 

Starting a model from initial conditions and then waiting for the result is rarely satisfactory. 
The model is generally incorrect, data is burdened with errors, and new data comes in 
that needs to be accounted for. This is a well-known problem in weather forecasting, and 
techniques to incorporate new data by sequential statistical estimation are known as data 
assimilation [Ij. The ensemble Kalman filter (EnKF) [2j is a popular data assimilation 
method, which is easy to implement without any change in the model. The EnKF evolves 
an ensemble of simulations, and the model only needs to be capable of exporting its state 
and restarting from the state modified by the EnKF. However, the ensemble size required 
can be large (easily in the hundreds), the amount of computations in the EnKF can be 
significant, special localization techniques need to be employed to suppress spurious long- 
range correlations in the ensemble covariance matrix, and the EnKF does not work well for 
problems with sharp coherent features, such as the travelling waves found in epidemics and 
wildfires. 

We propose a variant of EnKF based on the Fast Fourier transform (FFT), which reduces 
significantly the amount of computations required by the EnKF, as well as the ensemble 
size. The use of FFT is inspired by spatial statistic: FFT EnKF assumes that the state 



1 



approximately a stationary random field, that is, the covariance between two points is mainly 
a function of their distance vector. Then the multiplication of the covariance matrix and a 
vector is a convolution. In addition, the morphing transform [3j is used here so that changes 
of the state both in position and in amplitude are possible. 

The FFT EnKF with morphing is illustrated here for tracking a simulated epidemic wave. 
The use of data assimilation techniques can increase the accuracy and reliability of epidemic 
tracking by using the data as soon as they are available, and some applications of data 
assimilation in epidemiology already exist [U [5] . The FFT EnKF with morphing has the 
potential to reduce complicated simulations and accurate real-time use of data to a laptop 
or a smartphone in the field. 

For FFT EnKF in a wildfire simulation, see [6j. The Fourier domain Kalman filter 
(FDKF) p^J consists of the Kalman filter used in each Fourier mode separately. 

The covariance of a stationary random field can be estimated from a single realization 
by the covariogram [8], which can be computed efl&ciently by the FFT [9j. We propose to 
use the covariogram for an EnKF with an ensemble of one^ which will be further developed 
elsewhere. 

2 FFT EnKF 

The EnKF approximates the probability distribution of the model state u by an ensemble of 
simulations Ui^ . . . ^u^- Each member is advanced by the simulation in time independently. 
When new data d arrives, it is given as data likelihood d ^ N {Hu^R)^ where H is the 
observation operator and R is the data error covariance matrix. Now the forecast ensemble 
[uk] is combined with the data by the EnKF analysis ^0] 

ul = Uk + CnH^ {HCnH^ + R)'^ [d + Ck- HuC) , A; = 1, . . . , TV, (1) 

to yield the analysis ensemble [uf\. Here, Cn is an approximation of the covariance C 
of the model state, taken to be the covariance of the ensemble, and ek is sampled from 
(0, R). The analysis ensemble is then advanced by the simulations in time again. In [llj, 
it was proved that the ensemble converges for large to a sample from the Kalman filtering 
distribution when all probability distributions are Gaussian. Of course, the EnKF is used 
for more general cases as well. 

When Cn 18 the ensemble covariance, the EnKF formulation ([T]) does not take advantage 
of any special structure of the model. This allows a simple and efficient implementation 
[T2], but large ensembles, often over 100, are needed In an application, variables in the 
state are random fields^ and the covariance decays with spatial distance [8j. Tapering is 
the multiplication of sample covariance term-by-term with a fixed decay function that drops 
off with the distance. Tapering improves the accuracy of the approximate covariance for 
small ensembles [13], but it makes the implementation of ([T]) more expensive: the sample 
covariance matrix can no longer be eflSciently represented as the product of two much smaller 
dense matrices, but it needs to be manipulated as a large, albeit sparse, matrix. Random 
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fields in geostatistics are often assumed to be stationary, that is, the covariance between two 
points depends on their spatial distance vector only. 

The FFT EnKF discussed here uses a very small ensemble, but larger than one. We 
explain the FFT EnKF in the ID case; higher-dimensional cases are exactly the same. 
Consider first the case when the model state consists of one block only. Denote by u{xi)^ 
i = 1, . . . , n the entry of vector u corresponding to node x^. If the random field is stationary, 
the covariance matrix satisfies C (xi^Xj) = c{xi — Xj) for some covariance function c, and 
multiplication by C is the convolution 



U [Xn 



'^u{Xj)c{Xi-Xj) 



1, . . . ,n. 



After FFT, convolution becomes entry- by-entry multiplication of vectors, that is, multipli- 
cation by a diagonal matrix. 

We assume that the random field is approximately stationary, so we neglect the off- 
diagonal terms of the covariance matrix in the frequency domain, which leads to the the 
following FFT EnKF method. First apply FFT to each member, Uk = Fuk- Next, approx- 
imate the forecast covariance matrix in the frequency domain by the diagonal matrix with 
the diagonal entries given by 



N 



k=l 



Ui 



where m 



1 ^ 

k=i 



(2) 



Then define approximate covariance matrix Cn hj term- by-term multiplication • in the 
Fourier domain 

u = Cnv <^=^ u = Fu^ v = c9u^ V = F~^v^ {c9u). = CiUi. 
When H = I and R = rl^ the evaluation of ([T]) reduces to 

ul = Uk + c • (c + r)"-^ • (^d + ek- u^) • (3) 

In general, the state has more than one variable, and C, and H have the block form 

(4) 









■(7(11) .. 








u = 




, c = 


(7(Mi) . . 


(j{MM) 


, H = [HW . . 





Here, the first variable is observed, so i/^^^ = /, i/^^^ = 0,. . . , i/^^) = 0, and ^ becomes 

-1 



and in the frequency domain 



R 



d + ek-ul 



(1) 



Uk 



,M, 



(5) 



(6) 
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The cross-covariance between field j and field 1 is approximated by neglecting the oflF-diagonal 
terms of the sample covariance in the frequency domain as well, 

?!'" = F^E where sf' = l^S»f=l.i. (7) 

k=l k=l 

In the computations reported here, we have used the real sine transform, so all numbers 
in ([7]) are real. Also, the use of the sine transform naturally imposes no change of the state 
on the boundary. 



3 Morphing EnKF 

Given an initial state the initial ensemble in the morphing EnKF [3l [12] is given by 

4^={u^S+i + r'i:^)o{I + n), k = l,...,N, (8) 

with an additional member u^^i = called the reference member. In ([s]), r^^^ are random 
smooth functions on Tj. are random smooth mappings : Q ^ and o denotes 
composition. Thus, the initial ensemble varies both in amplitude and in position, and the 
change position is the same in all blocks. The random smooth functions and mapping are 
generated by FFT as Fourier series with random coefficients with zero mean and variance 
that decays quickly with frequency. 

The data d is an observation of u^^\ The first blocks of all members Ui^ . . . ^u^ and d 
are then registered against the first block of u^+i as 

4'^ - ^n\i ^{I + Tk). ^ 0, VT, ^ 0, = 0, . . . , TV, 

u^q"^ = d and Tk : f^^f^, /c = 0,...,A^ are called registration mappings. The registration 
mapping is found by multilevel optimization. The morphing transform maps each ensemble 
member Uk into the extended state vector, the morphing representation^ 

Uk^Uk = Muj^^, {uk) = (Tk, r^k^, . . . , r[^^^ , (9) 

where r^"^^ = u^^"^ o (/ + Tk)~^ — ^ ^ 0, . . . , A^, are registration residuals. Likewise, 

the extended data vector is defined by d d = ^Tq, Tq^^^ and the the observation operator 

is (r,r(i), . . . ,r(^)) ^ (r,r(i)). We then apply the FFT EnKF method g is applied to 
the transformed ensemble Si, . . . ,Un. The covariance C^^^"^ in ^ consists of three diagonal 
matrices and we neglect the off-diagonal blocks, so the fast formula ^ can be used. The 
analysis ensemble . . . , u^^i is obtained by the inverse morphing transform 

= M-^, K) = (4Vi + rf^) ° + , k = l,...,N + l, (10) 
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where the new transformed reference member is given by 



1 ^ 

^N+l = J^Y.^l- (11) 

k=l 



4 Epidemic model 

The epidemic model that we used for this study is a spatial version of the common S-I-R 
dynamic epidemic model. A person is susceptible or infectious in this context if he or she 
can contract or transmit the disease, respectively. The removed state includes those who 
have either died, have been quarantined, or have recovered from the disease and become 
immune. The state variables are the susceptible {S)^ the infectious (/), and the removed {R) 
population densities. The core ideas for this model date back to the 1957 spatial formulation 
by Bailey [T4], but the specific version that we have employed here is due to Hoppenstaedt 
[ISl p. 64]. 

The population is considered to be dispersed over a planar domain C M^, and it 
is labelled according to its position with respect to the spatial coordinates x and y. The 
(deterministic) evolution of the state [S (t) ^ I (t) ^ R (t)) is given by 



^^^1^ = S{x,y,t) J J w (x, y, u, v) I {u, v, t) dudv - q (x, y, t) I (x, t) , > (12) 



'-^^ = q,{x^y^t)I{x^y^t). 

The function q (x, ^, t) gives the rate of removal of infectives due to death, quarantine, 
or recovery. The weight function w (x^y^u^v) measures the infiuence of infectives at spatial 
position (u^v) on the exposure of susceptibles at position (x,y); in this simulation we used 
the function w (x, u^v) = a exp [-{{x - uy + {y - vyy/yx] , which expresses the idea that 
the infiuence of nearby infectives decays as an exponential function of Euclidean distance, 
with constant A, characteristic of the distance at which the disease spreads. More mobile 
societies will have larger values of A. The parameter a measures the infectiousness of the 
disease. 



A stochastic cell model is created by treating the quantities on the right-hand-side of (12) 
as the intensities of a Poisson process and by piecewise constant integration over the cells. 
The domain Q is decomposed into nonoverlapping cells Qi with centers {xi^yi) and areas 
A (r^i), i = 1, . . . , The state in the cell Vti is the random element (aS^, 7^, i?^), advanced in 
time over the interval [t, t + At] by 

Si{t + /\t) = Si{t)-ASi, Ii{t + At) = Iiit) + ASi-ARi, {t + At) = Ri (t) + AR^, 
where the random increments ASi and ARi are sampled from 

ASi ~ Pois (^Si (t) Y:f=iw (xi, Vi, X,-, y^) Ij (t) A (a) At) , (13) 
ARi ~ Pois (qi (t) li {t) A {Qi) At) , 
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and qi (t) is the given removal rate in the cell Qi. The summation in (13) is done only over 
the cells Qj near Qi] for far away cells, the weights w {xi^i/i^ Xj^yj) are negligible. It is not 
necessary to compute a Poisson-distributed transmission rate from each source cell to a given 
target cell, because a finite sum of independent Poisson-distributed random variables, each 
with its own intensity parameter, is itself Poisson-distributed with an intensity parameter 
equal to the sum of the individual intensities. 



5 Computational results 

We have chosen to model an epidemic disease that first emerges in Congo. The computational 
domain is a square portion of central Africa. In Figure [l] (a), we see the epidemic wave 120 
model time steps after the emergence of the disease. The behavior of the model is such that 
any spurious infection will tend to grow into a secondary infection wave. This is problematic 
for data assimilation because the occurrence of spurious features is virtually guaranteed. 
We attempt to reduce the occurrence and magnitude of these features using the morphing 
transformation and FFT EnKF; however, some amount of residual artifacts will remain. 
We have found that by processing the model state in the following manner, we can further 
reduce these artifacts. We begin by scaling the absolute quantities contained in the model 
variables to a percentage of the local population before performing the data assimilation. 
After data assimilation, we truncate the variables to the range [0, 1], and we apply a threshold 
so that any infection rate below 1% is set to 0. Finally, we rescale the output in absolute 
units ensuring that the number of people at each grid cell is preserved. We have applied 
the FFT EnKF to the epidemic model described in Section |4] with an ensemble of size 5. 
Each ensemble simulation was started with the same initial conditions, but with different 
random seeds, and advanced in time by 100 model time units, then perturbed randomly 
to obtain the initial ensemble. The analysis ensemble and data were advanced in time an 
additional 20 model time steps for further assimilation cycles. In total, 3 assimilation cycles 
were performed in this manner. 

We have perturbed each member of the initial ensemble randomly in space by applying 



(10) to the each variable of the morphing representation of the model. The mappings 
for this perturbation were generated from a space of smooth functions that are zero at the 
boundary. While the residuals Tk are customarily initialized to smooth random fields as well, 
we have chosen to set r^^^ = to avoid spurious infections. We instead multiply each field 
after the inverse morphing transform by 1 + where Sk is another smooth random field. 
This ensures that an initial infection rate of is unchanged by the perturbation. A part of 
a typical ensemble with spatial as well as amplitude variability is shown Figure [l] (b) . 

The output of the observation function used in this example consists of the Infected field 
of the model. In this case, the data is a spatial "image" of the number of infected persons in 
each grid cell. The data were generated synthetically from a model simulation, which was 
initialized in the same manner as the ensemble. 

Four variants of the EnKF were then applied: the standard EnKF and FFT EnKF and 
morphing EnKF and morphing FFT EnKF. The same initial ensemble and the same data 
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Susceptible, forecast member 2, cycle 1 



Infected, forecast member 1, cycle 1 
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Removed, forecast member 2, cycle 1 
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(a) 



5°S 




20°E 25°E 30°E 35°E 40°E 
longitude 

Infected, forecast member 2, cycle 1 




20°E 25°E 30°E 35°E 40°E 
longitude 

Infected, forecast member 3, cycle 1 
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longitude 



(b) 



Figure 1: (a) The number of people per kilometer squared infected, susceptible, and removed 
after 120 time steps in a simulation of an epidemic disease spreading through central Africa. 
These images correspond to variables /, S', and R in Equation (12). (b) Number of people 
infected per kilometer squared in three forecast ensemble members. 
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were used for each method. The deviation of the initial ensemble and the model error were 
chosen so that the analysis should be about half way between the forecast and the data. In 
the morphing variants, the data deviation in the amplitude was taken very large, so that 
the filter updates essentially only the position. Ensemble of size 5 was used. The result in 
the first assimilation cycle for each method is shown in Figures [2] and jS) The first image 
in each column is the forecast mean. In the morphing variants, the mean is taken over all 
ensemble members in all fields of the morphing representation ^ and it plays the role of 
the comparison state for registration. Thus, in the morphing variants, both the amplitude 
and the position of the infection wave in the ensemble members are averaged. The second 
image in each column is the data, which is a model trajectory started from the same initial 
state for each method. Because the model is itself stochastic, the data images are slightly 
different. The third image in each column is the analysis mean, which is taken in the 
morphing representation (11) for two morphing filters, so that both the amplitude and the 
location are averaged. 

We see that both standard EnKF and FFT EnKF filters cannot move the state towards 
the data; a much larger ensemble would be needed. The morphing EnKF does move the 
state towards the data, but there are strong artifacts due to the poor approximation of the 
covariance by the covariance of the small ensemble. Finally, the morphing FFT-EnKF is 
capable of moving the state towards the data better. 



6 Conclusion 

We have introduced morphing FFT EnKF and presented preliminary results on data assim- 
ilation for an epidemic simulation. Morphing was essential to move the state towards the 
data, but it resulted in artifacts for the small ensemble size used, yet small ensemble size 
is important to perform simulations with data assimilation on general computing devices 
instead of supercomputers. We have observed that the estimation of the covariance matrix 
in the frequency domain results in better forecast covariance in the algorithm, which has the 
potential to reduce the artifacts due to small ensemble size. 
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Infected forecast mean, cycle 1 



Infected forecast mean, cycle 1 
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Infected, data, cycle 1 
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(b) 



Figure 2: The number of people infected per kilometer squared in analysis cycle 1 using the 
standard EnKF and FFT EnKF, each with ensemble size of 5. Both approaches are unable 
to move the location of the infection in the simulation state. 
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Infected forecast comparison, cycle 1 



Infected forecast comparison, cycle 1 
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Figure 3: The number of people infected per kilometer squared in analysis cycle 1 using the 
morphing EnKF and morphing FFT EnKF, each with ensemble size of 5. Both approaches 
are able to move the state spatially and perform similarly. However, EnKF suffers from 
stronger artifacts due to low accuracy and low rank of the ensemble covariance than the 
morphing FFT EnKF. 
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